A Flexible Convex Optimization Model for Semi-supervised Clustering with Instance-level Constraints
نویسندگان
چکیده
Clustering is a common task in many applications e.g. digital image processing, text mining and bioinformatics. Many techniques such as k-means, hierarchical clustering and spectral clustering, have been proposed. In a previous study, we proposed a quadratic programming model to address the fuzzy binary clustering problem in the unsupervised setting and then extended it to the general clustering problem. In this paper, we extend further the model in the semi-supervised setting. It has three salient characteristics. First, both the label and link information of known samples can be integrated easily. Second, it illustrates the linkage between the hard binary clustering and fuzzy binary clustering in one framework, suggesting the benefits of fuzzy binary clustering theoretically. Third, a fast iterative algorithm is proposed, which can be applied to very large data sets. Numerical experiments on two data sets suggest its practical effectiveness and efficiency.
منابع مشابه
An Effective Semi-Supervised Clustering Framework Integrating Pairwise Constraints and Attribute Preferences
Both the instance level knowledge and the attribute level knowledge can improve clustering quality, but how to effectively utilize both of them is an essential problem to solve. This paper proposes a wrapper framework for semi-supervised clustering, which aims to gracely integrate both kinds of priori knowledge in the 598 J. L. Wang, S.Y. Wu, C. Wen, G. Li clustering process, the instance level...
متن کاملExtracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملOn the Comparison of Semi-Supervised Hierarchical Clustering Algorithms in Text Mining Tasks
Semi-supervised clustering approaches have emerged as an option for enhancing clustering results. These algorithms use external information to guide the clustering process. In particular, semi-supervised hierarchical clustering approaches have been explored in many fields in the last years. These algorithms provide efficient and personalized hierarchical overviews of datasets. To the best of th...
متن کاملSemi-supervised Clustering by Input Pattern Assisted Pairwise Similarity Matrix Completion
Many semi-supervised clustering algorithms have been proposed to improve the clustering accuracy by effectively exploring the available side information that is usually in the form of pairwise constraints. However, there are two main shortcomings of the existing semi-supervised clustering algorithms. First, they have to deal with non-convex optimization problems, leading to clustering results t...
متن کاملComposite Kernel Optimization in Semi-Supervised Metric
Machine-learning solutions to classification, clustering and matching problems critically depend on the adopted metric, which in the past was selected heuristically. In the last decade, it has been demonstrated that an appropriate metric can be learnt from data, resulting in superior performance as compared with traditional metrics. This has recently stimulated a considerable interest in the to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011